Hierarchical multilayer perceptron based language identification

نویسندگان

  • David Imseng
  • Mathew Magimai-Doss
  • Hervé Bourlard
چکیده

Automatic language identification (LID) systems generally exploit acoustic knowledge, possibly enriched by explicit language specific phonotactic or lexical constraints. This paper investigates a new LID approach based on hierarchical multilayer perceptron (MLP) classifiers, where the first layer is a “universal phoneme set MLP classifier”. The resulting (multilingual) phoneme posterior sequence is fed into a second MLP taking a larger temporal context into account. The second MLP can learn/exploit implicitly different types of patterns/information such as confusion between phonemes and/or phonotactics for LID. We investigate the viability of the proposed approach by comparing it against two standard approaches which use phonotactic and lexical constraints with the universal phoneme set MLP classifier as emission probability estimator. On SpeechDat(II) datasets of five European languages, the proposed approach yields significantly better performance compared to the two standard approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of recurrent network for unknown language rejection in language identification system

In the past, we attempted to use a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidentified as one of the target languages in language identification system. However, the use of multilayer perceptron neural network could not utilize the temporal information from the utterances. Results show that with the use of phonemic unigram as input f...

متن کامل

UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports

We propose a deep neural network based natural language processing system for clinical information (such as time information, event spans, and their attributes) extraction from raw clinical notes and pathology reports. Our approach uses the context words and their partof-speech tags and shape information as features. We utilize the temporal (1D) convolution neural network to learn the hidden fe...

متن کامل

Unknown language rejection in language identification system

The number of languages in the world is much larger than the number of target languages that current language identication systems can handle. Therefore, we propose here the use of a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidenti ed as one of the target languages. We consider not only the target language identi cation rate but also ...

متن کامل

Classifier Stacking for Native Language Identification

This paper reports our contribution (team WLZ) to the NLI Shared Task 2017 (essay track). We first extract lexical and syntactic features from the essays, perform feature weighting and selection, and train linear support vector machine (SVM) classifiers each on an individual feature type. The output of base classifiers, as probabilities for each class, are then fed into a multilayer perceptron ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010